10 research outputs found

    A practical guide to multi-objective reinforcement learning and planning

    Get PDF
    Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

    A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Get PDF
    Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

    A Virtual Maze Game to Explain Reinforcement Learning

    No full text
    We demonstrate how Virtual Reality can explain the basic concepts of Reinforcement Learning through an interactive maze game. A player takes the role of an autonomous learning agent and must learn the shortest path to a hidden treasure through experience. This application visualises the learning process of Watkins' Q(λ), one of the fundamental algorithms in the field. A video can be found at https://youtu.be/sLJRiUBhQqM.info:eu-repo/semantics/publishe

    Heuristic Coordination in Cooperative Multi-Agent Reinforcement Learning

    No full text
    Key to reinforcement learning in multi-agent systems is the ability to exploit the fact that agents only directly influence only a small subset of the other agents. Such loose couplings are often modelled using a graphical model: a coordination graph. Finding an (approximately) optimal joint action for a given coordination graph is therefore a central subroutine in cooperative multi-agent reinforcement learning (MARL). Much research in MARL focuses on how to gradually update the parameters of the coordination graph, whilst leaving the solving of the coordination graph up to a known typically exact and generic subroutine. However, exact methods { e.g., Variable Elimination { do not scale well, and generic methods do not exploit the MARL setting of gradually updating a coordination graph and recomputing the joint action to select. In this paper, we examine what happens if we use a heuristic method, i.e., local search, to select joint actions in MARL, and whether we can use outcome of this local search from a previous time-step to speed up and improve local search. We show empirically that by using local search, we can scale up to many agents and complex coordination graphs, and that by reusing joint actions from the previous time-step to initialise local search, we can both improve the quality of the joint actions found and the speed with which these joint actions are found

    Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems

    No full text
    Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms

    Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures

    No full text
    Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this work, we focus on learning to coordinate. Specifically, we consider the multi-agent multi-armed bandit framework, in which fully cooperative loosely-coupled agents must learn to coordinate their decisions to optimize a common objective. We propose multi-agent Thompson sampling (MATS), a new Bayesian exploration-exploitation algorithm that leverages loose couplings. We provide a regret bound that is sublinear in time and low-order polynomial in the highest number of actions of a single agent for sparse coordination graphs. Additionally, we empirically show that MATS outperforms the state-of-the-art algorithm, MAUCE, on two synthetic benchmarks, and a novel benchmark with Poisson distributions. An example of a loosely-coupled multi-agent system is a wind farm. Coordination within the wind farm is necessary to maximize power production. As upstream wind turbines only affect nearby downstream turbines, we can use MATS to efficiently learn the optimal control mechanism for the farm. To demonstrate the benefits of our method toward applications we apply MATS to a realistic wind farm control task. In this task, wind turbines must coordinate their alignments with respect to the incoming wind vector in order to optimize power production. Our results show that MATS improves significantly upon state-of-the-art coordination methods in terms of performance, demonstrating the value of using MATS in practical applications with sparse neighbourhood structures.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems

    No full text
    Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings,i.e.,conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents.We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings,including learning control policies for wind farms

    Estimating minimum adult HIV prevalence: A cross-sectional study to assess the characteristics of people living with HIV in Italy

    No full text
    In 2012, we conducted a retrospective cross-sectional study to assess the number of people living with HIV linked to care and, among these, the number of people on antiretroviral therapy. The health authority in each of the 20 Italian Regions provided the list of Public Infectious Diseases Clinics providing antiretroviral therapy and monitoring people with HIV infection. We asked every Public Infectious Diseases Clinic to report the number of HIV-positive people diagnosed and linked to care and the number of those on antiretroviral therapy during 2012. In 2012, 94,146 people diagnosed with HIV and linked to care were reported. The majority were males (70.1%), Italians (84.4%), and aged between 25 and 49 years (63.4%); the probable route of transmission was heterosexual contact in 37.5% of cases, injecting drug use in 28.1%, and male-to-male contact in 27.9%. Among people in care, 20.1% had less than 350 CD4 cells/μl, 87.6% received antiretroviral therapy, and among these, 62.4% had a CD4 cell count higher than 350 cells/μl. The overall estimated prevalence of individuals diagnosed and linked to care in 2012 in Italy was 0.16 per 100 residents (all ages). Adding the estimated proportion of undiagnosed people, the estimated HIV prevalence would range between 0.19 and 0.26 per 100 residents. In Italy, the majority of people diagnosed and linked to care receive antiretroviral therapy. A higher prevalence of individuals diagnosed and linked to care was observed in Northern Italy and among males. More information for developing the HIV care continuum is necessary to improve the entire engagement in care, focusing on test-and-treat strategies to substantially reduce the proportion of people still undiagnosed or with a detectable viral load
    corecore